DOCUREKT BESUBE 



ED 106 317 



TH 004 423 



AUTHOB 
TITLE 

IHSTITOTIOH 
SPOMS A6EMCT 

FOB DATE 
NOTE 

AVAILABLE FBOH 



EDBS PBICE 
DESCBIPTOBS 



IDEMTIFIEBS 



Larkin, Kevin C«; Weiss, David J. 

An Eipirical CoBparison of Tvo^Stage and Pyraiidal 

Adaptive Ability Tfe3ting. 

Minnesota Univ«# Minneapolis. Dept. of Psychology. ; 
Office of Naval Besearch, Vashington, D.C. Personnel 
and Training Besearch Programs Office. 
BB-75-1 
Peb 75 

36p.: For a related docnsent, see TH 004 386 
Psychoietric Hethods Prograi, Dept. of Psychology, ^ 
University of Minnesota, Hicneapolis, Minnesota 55455 
(vhile supplies last) 

MF-$0.76 HC-$1.95 PLUS POSTAGE 
"•"Ability; Aptitade Tests; '•'Cottt^arative Analysis; 
^'CoBpater Programs; Guessing (Tests); Higher 
Education; "^Individual Differences; Scoring; Scoring 
Formulas; Test Construction; ^Testing; Testing 
Probleas; Test Beliability; Tests; Test Validity 
Pyraiidal Testing; Tvo Stage Testing 



ABSTBACT 

A 15-stage pyramidal test and a 40-ite»' two-stage 
test vere constructed and adiinistered by computer to 111 college 
undergraduates. The tvo- stage test vas found to utilize a sialler 
proportion of its potential score range than the pyrasidal test. 
Score distributions for both tests vere positively skeved but not 
significantly different from the normal distribution. The pyraiidal 
test's score distributions tended to be platykurtic while the 
tvo**stage test's distribution tended to be leptokurtic. The 
assignient of subjects to leasureient subtests in the tvo'-^stage tft^t 
vas lore accurate than in a picevious eipirical investigation since 
the Bisclassification rate vas less than IX. Coiparison of scoring 
■ethods for the pyraiidal strategy supported earlier findings that 
the average difficulty scoring Methods vere aost useful. The 
correlations betveen scores on the tvo adaptive strategies ranged 
froii r«.79 to .84. Both adaptive strategies appeared to adapt itei 
difficulties to individual differences in abilities so as to reduce 
chance ef:!ects due to guessing. The pyramidal strategy seeded to be 
slightly uore successful in eliiinating guessing than the tvo<*stage 
strategy. Besults are discussed vith respect to internal consistenci 
reliabilities, stabilities, and the relation of each strategy to 
conventional testing. Siiulation studies ar:^ suggested to further 
delineate the optiiuB characteristids of each testing strategy. 
(Author) 



ERIC 



1 1 



AN EMPIRICAL COMPARISON OF 
TWO- STAGE AND PYRAMIDAL ADAPTIVE 
ABILITY TESTING 



ol 
iJJI 



1 1 



us DEPARTMENT OF HEALTH. 
EDUCATION* WELFARE 
NATIONAL INSTITUTE OF 
EDUCATION 

THIS DOCUMENT MAS BEEN REPRO 
OUCED EXACTLY AS RECEIVED FROM 
THE PERSOnOR ORCANIZATIONORICIN 
ATINC IT POINTS OF VIEW OR OPINIONS 
STATED DC NOT NECESSARILY REPRE 
SENT OFFICIAL NATIONAL INSTITUTE OF 
EDUCATION POSITION OR POLICY ' 



Kevin C. Larkin 
and 

David J. Weiss 



Research Report 75-1 

Psychometric Methods Program 
Department of Psychology 
University of Minnesota 
Minneapolis, MN 55^55 

February 1975 



00 



Prepared under contract No. N00014-67-A-011 3-0029 
NR No, 150-3^3» with the Personnel and 
Training Research Programs, Psychological Sciences Division 

Office of Naval Research 



o 
o 



Approved for public release; distribution unlimited. 
Reproduction in whole or in part > s permitted for 
any purpose of the .United States Government* 



FRir 



Unclassified 



Sr.CUfllTY CL A'.Mf'lCATlOM THI'* PACT (Wb*n lUf Mnf«f«d) 



RilPORY DOa)nEUTAT(O.I PAGE 




t. HLPOKT NUMUiH 2« OOVT ACCCSSION NO. 

Research Report 75-1 


3. nCClPll^NT'S CATALOG HUXIUER 


4. TiTLt (•nd 5ubtia») 

An Empirical Comparison of Tvo-stage and 
Pyramidal Adaptive Ability Testing 


I TYPE OF REPORT A PL.RIOD COVERED 

Technical Report 


4. PERFORMING ORG. RL^'ORT NUMBER 


/• AU » MO"! 

Kevin C. Larkln and David J. Weiss 


1. COMTRACT OR GHANT NUMDElR^«i 

N00014-67-0113-0029 


y* f tKr Vttir Iri Vnv Ani & A 1 iVn r*r^m%» AflLI AUii'rtCaa 

Department of Psychology 
University of Minnesota 
Minneapolis, Minnesota 55455 


to. PI^OCRAM ELCMCHT. PFIOJECT, TASK 
AREA It WORK UNIT NUMUtnS 

P.E. :61153N PROJ. :RRO42-04 
T.A. :RR04?-04-01 
M.U. :NR150-343 


1 1. CONTROtt.INC OFFICE N AMC AND ADDRESS 

Personnel and Training Research Programs 
Office of Naval Research 
Arlington, Virginia . 22217 


12. REPORT DATS 

February 1975 


1$. NUMBER OF PAGES 

27 


14. MONlTORtNC AGENCY NAME A AOORESS^I/ iillUtmt Itom C^trotltni OtUc$) 


15. SECURITY CLASS, (ot tM» thport) 


iS«. DECLASSIFICATION/ DOWNGRADING 
SCHEDULE 



1*. DISTR10U1ION STATEMEf4T fo/ tht» Rmpmi) 



Approved for public release; distribution unlimited. Reproduction in whole 
or in part is permitted for any purpose* of the United States Government. 



17. DISTRIBUTION STATEMENT (el th» mb^tfci mffd In Bfek 20, It dllUrmI Iraw Report) 



IS. SUPPLEMCNTARY NOTES 



19. KEY WORDS (Centlno* on fvf •Idt^ li n#c##««fr md Id9nilty by block numbt) 

testing sequential testing programmed testing 

ability testing branched testing response-contingent testing 

computerized testing individualized testing automated testing 

adaptive testing tailored testing 



20. ABSTRACT (Ccntliw cn f •Ido U fi«c»«««fr mtd Identity by btoek twmbf) 

A fifteen-Stage pyramidal' test and a 40-item two-stage .test were con-' 
structed and administered by computer to 111 college undergraduates. The 
two-stage test was found to utilize a smaller proportion of its potential 
score range than the pyramidal test. . Score distributions for both tests 
were positively skewed but not significantly different from the normal distri- 
bution. The pyramidal test's score distributions tended to-be platykurtic 
while the two-stage test's distribution tended to be leptokurtic. The 



ERIC 



D 1 jSn 71 1473 W'T'O" ' NOV •» '» oowteTB ^ Unclassified _ 



Unc"'a8slfled 



Sw'COillTY CLAISiriCATIOM OF THIS PAOefKM^O^* KnttiQ 



assignment of subjects to measurement subtests in the two-stage test was more 
accurate than in a previous empirical investigation since the misclassif ication 
rate was less than IZ. Comparison of scoring methods for the pyramidal 
strategy supported earlier findings that the average difficulty scoring methods 
were most useful. The correlations between scores on the two adaptive 
strategies ranged from r«.79 to .84. Both adaptive strategies appeared to 
adapt item difficulties to individual differences in abilities so as to reduce 
chance effects due to guessing. The pyramidal strategy seemed to be slightly 
more successful in eliminating guessing than the two-stage strategy. Results 
are discussed with respect to internal consistency reliabilities, stabilities, 
and the relation of each strategy to conventional testing. Simulation studies 
are suggested to further delineate the optimum, characteristics of each testing 
strategy* 



ERLC 



Unclassified. 

tCCURITY CLASSiriCATlON 0^ THIS PAOe •'•'•'•^ 



Contents 



Introduction 

P' amldal Tests • 1 

Two-stage Tests 3 

Research Cougar Ing Two-stage and Pyramidal Tests 5 

Purpose 6 

Method 7 

Test Construction 7 

Item Pool 7 

Construction of the Two-stage Test 8 

Routing Test 8 

Measurement Tests 9 

Scoring 10 

Construction of the Pyramidal Test « 11 

Scoring * 11 

Test Administration and Subjects 12 

Analysis 12 

Order Effects 12 

Characteristics of Score Distributions 12 

Relationships Between Two-stage and Pyramidal Scores 13 

Internal Consistency Reliability 13 

Mlsroutlng 14 

Intercorrelatlons of Pyramidal Scores p 14 

Results 14 

Order Effects 14 

Score Distributions 15 

Pyramidal Test 15 

Two-stage Test 16 

Relationship Between Two-stage and Pyramidal Scores 17 

Internal Consistency Reliability 18 

Mlsroutlng 19 

Intercorrelatlons of Pyramidal Scores 19 

Discussion and Conclusions 20 

References 

Appendix A. Difficulty and Discrimination Item Parameters for Two-stage 

Test 26 

Appendix B. Difficulty and Discrimination Item Parameters for 

Pyramidal Test 27 



AN EMPIRICAL COIPARIIJON OF TWO-STAGE 
AND PYRAMIDAL ADAPTIVJ2 ABILITY TESTING 



The administration of ability test i':em8 by TCans of an interactive 
computer system has enabled test administrators to tailor or adapt tests to 
individual differences in testee ability. Items are selected by a set of 
rules or "strategy" determined prior to tiisting (see Weiss, 1974, for a 
discussion of the various adaptive testing strategies). At one or more 
points in the testing, a testee 's respoasas to previously administered items 
are evaluated, and a tentative estimate o:: ability is made. Subsequent 
items are generally selected so that theii: difficulties are close to the 
testee *s estimated ability. This procedure permits testing time to be 
shortened in coiiq>ari8on to conventional p^iper and pencil methods of tasting 
vrlthout reducing either the reliability or validity of the test. Computerized 
adaptive testing also has other advantages: over conventional tests (see 
Weiss and Betz, 1973). 

Empirical research on two adaptive strategies, the pyramidal test and 
the two-stage test, has been reported in the present series of research 
papers (Betz & Weiss, 1973; Larkin & Weiss, 1974). In both of these studies, 
the adaptive test was compared to a conventional test on a number of 
psychometric criteria. The present study directly compares the two adaptive 
strategies using the same group of subjectn. 

Pyramidal Tests 

The pyramidal testing method structures items into a triangular con- 
figuration according to item difficulties. Item .administration follows the 
general branching rule that a more difficult item follows a correct response 
while an easier item follows an incorrect response. Figure 1 illustrates a 
typical pyramidal test. The first Item administered is at the top of the 
pyramidal structure and is usually .4ne of median difficulty (proportion 
correct, p=.SO) based on previous Item analyses. The second item administered 
to any testee depends on whether his/her response to the first item is 
correct or Incorrect. If the testee answers the first item correctly, a 
— more difficult item (p^.4S) is presented nexl:. An. item of lesser difficulty . 
(p-.SS) is presented next if the initial itent is answered incorrectly. 
Thus, there are two items available at the second level or "stage" of the 
pyramid. Branching to the third stage depends on the correctness of the 
response to the second-stage item. This proci^ss is repeated until the 
testee has attempted one item at each of a fisced number of stages. 

The increment in difficulty following a correct response in Figure 1 is 
equal to the decrement in difficulty following an incorrect response. Thus, 
branching within this pyramidal structure uses an "equal offset." Unequal 
offsets with smaller increments than decrements can be used as a correction 
for guessing (Weiss, 1974, p. 16). 

The number of items to be answered by any testee is pmall when compared 
to the total number of items in the pyramidal structure. In general 

items are needed to construct a pyramid of n stages when one item is attempted 
at each stage. 



ERLC 



6 



-2- 



o 



0) 

u 

00 



0) 
U 

u 



CO 
0) 



CO 
•d 

0 

00 
(0 
4J 

m 
I 

c 

0) 



(0 

c 



CO 
0) 



3 



a 

0) 



in 



in 



/ 

/\ 

in in 

/ 'V v ■ 

o o o 

'J in in m 

^irr m in m i 

so \o \o >o 

X/'VV 
\/'\/'\ 

ITS m m 

fv. fv. 

\ / v / ■ 

•\ / \ 

9 00 

^ 

■\ 



in 



o 

00^ 



so 



o 



O 
CO 



O 



o 
in 



o 
so 



o 



o 

00 



o 

ON 



M 0) 



a u 
M o 

Ut o 



CO 



in 



CO 



o\ 



ERIC 



(0 
CO 



M CM 



CO 



so 



00 



-3- 



Many ways of scoring pyramidal tests have been developed (see Weiss » 
1974, pp. 30-34). The ranked difficulty of the final item has been used 
as the individual's score (Bayroff , Thomas, & Anderson, 1960; Seeley, Morton 
& Anderson, 1962; Waters i Bayroff, 1971). Testees coir>leting the pyramid 
shown in Figure 1 could receive scores of from 1 to 1" under this scoring 
method, since there are only ten items available at the tenth stage of 
testing. The number of rank positions can be increased by assigning a 
higher rank to those subjects answering the final item correctly than to 
those who do not (Bayroff & Seeley, 1967; Waters, 1964). The difficulty 
of the final item atteiq>ted has also been used to estimate an individual's 
ability (Bayroff, 1969). Another scoring method branches the testee to a 
hypothetical (n+l)th item following the final item and estimates its 
difficulty (Hansen, 1969; Lord, 1971b; Weiss, 1974, p. 31). The difficulties 
of all items atteiq>ted or all items correctly answered may be averaged to 
provide a score based on more Information (Larkln & Weiss, 1974). Lord 
(1970, 1971b) has reconmended an averaging method which excludes the first 
item (since all testees atteiq^t it) and includes the (n'hl)th item. 
Hansen (1969) has proposed a more co]iq>lex scoring method which assigns an 
estimated score to each item in the pyramid, whether or not- it is attempted. 

Weiss (1974) compares pyramidal tests with other strategies of 
adapuive testing. The research literature on pyramidal adaptive tests has 
been reviewed by Weiss and Betz (1973) and summarized by Larkln and Weiss 
(1974). 

Two-stage Tests 

A two-stage test consists of a preliminary or routing test followed by 
one of several measurement tests. Figure 2 Illustrates a sample two-stage 
structure. The purpose of the routing test is to provide an approximate 
estimate of the testee's ability level so that a measurement test of 
appropriate difficulty can be selected for each testee. The routing test 
can be composed of items with difficulties either peaked at the ability 
level of the group taking the test (as shown in Figure 2) or distributed 
throughout the range of ability under consideration (see Weiss, 1974, pp. 4-7). 
The measurement tests are usually peaked tests of differing levels of 
difficulty. The routing test is administered to the testee and his/her 
score is determined. A measurement test of appropriate difficulty is 
selected, based on the testee's score on the routing test. The measurement 
test is then administered and the testee 's score is determined. 

Variants of the two-stage routing procedure (see Weiss, 1974, p. 7) 
include double-routing and "sequential" procedures (Cleary, Linn, & Rock 
1968a,b; Linn, Rock, & Cleary, 1969). The former requires two routing 
tests to be administered. A testee^s score on a preliminary routing test 
determines which of several intermediate routing tests are attempted. 
Branching to an appropriate measurement test is based on the testee* s 
performance on the second routing teat. The sequential procedure involves 
computing likelihood ratios after each response to items in the routing 
test. Branching to a measurement test occurs when the likelihood ratio permits 
a classification of the individual. 

Most methods of scoring two-stage tests have used information from 

ERLC 8 



-4- 



Pi 

4- <^ (i . e 
-4— H 1 — I — ? 



f r r Y t t T 1^ T ? 




-5- 



both the routing and measurement subtests. Lord (1971c) and Betz and 
Weiss (1973) have combined maximum likelihood ability estimates from the 
routing and measurement subtests to determine an overall estimation of a 
testee*s ability. Linn, Rock, & Cleary (1969), on the other hand, did not 
Include a testee's performance on the routing test In some of their scoring 
procedures. 

Weiss (1974) compares two-stage tests with other strategies of adaptive 
testing, and discusses potential advantages and limitations of this approach. 
Research literature on two-stage adaptive testing has been reviewed by Weiss 
& Betz (1973) and Betz & Weiss (1973) . 

Research Comparing Two-stage and Pyramidal Tests 

The only study Including both two-stage and pyramidal testing strctegles 
was reported by Linn, Rock & Cleary (1969). That study, using "real-data 
simulation" methods, was based on the responses of a large group of testees 
to a 190-ltem conventional paper-and-pencll test. The Item responses were 
then used to simulate a testee^s responses to two-stage and pyramidal adaptive 
testing strategies. Five different two-stage strategics were compared to 
two pyramidal strategies. 

The first two-stage test Included a 20-ltem routing test with a rectangular 
distribution over a "broad range" of Item difficulties, and four 20-ltem 
measurement tests. The second employed a double-routing procedure. A 
testee's score on a 10-ltem routing test determined which of two second-stage 
10-item routing tests was administered. Scores on the second routing test 
branched the testee to one of four 20-ltem measurement tests. The third two- 
stage procedure used a 20-item "group discrimination" rouclng test. Items 
In that test were those which showed the largest differences in proportion 
correct between groups divided Into quartlles on total scores for the original 
190 Items. The routing test In the two final strategies Involved computing 
likelihood ratios after each item, and branching occurred when the likeli- 
hood ratio permitted a classification of the Individual Into groups based on 
scores derived from the parent 190 Items. These methods were called "sequential" 
procedures. Both a three-group and a four-group sequential approach were 
used. Linn et al. used two methods to score their two-stage tests. One 
used the Information obtained from the routing test while the other did not. 

Linn et al. studied two variations of the pyramidal strategy. The first 
pyramidal test had ten stages with an entry point of p=.6S^ a step size of 
.02 and an equal offset. Items were weighted according to difficulty, and 
scores represented the sum of the weights of Items attempted by each testee. 
The second pyramidal strategy consisted of five stages with five items per 
stage (see, e.g., Weiss, 1974, pp. 25-26). Branching occurred from block to 
block. This pyramid was scored using a weighted scoring scheme similar to 
that used for the single-Item pyramid. 

All seven adaptive tests were compared to five conventional subtests of 
from 10 to 50 Items selected from the same 190-ltem parent test. Scores on 
the two-stage strategies correlated from .93 to .97 with scores on the 190- 
Itam parent test, while the shortened conventional tests had correlations of 



ERLC 



10 



-6- 



from .89 to .96 with the full conventional test. The 25-ltem pyramid showed 
a comparable correlation (.95) » but the ten-'ltem pyramid correlated only 
.87 wlLn the parent test. ^ 

Since all the Items In the adaptive and shortened conventional tests 
were also Included In the longer parent test, and since the correlations with 
the parent test Increased as the length of the shorter teats Increased » It 
Is possible that the degree of correlation obtained In this study could be 
due partly to the number of Item^ In common between the tests. 

When achievement test criteria were obtained, scores on the ten-stage 
pyramidal test correlated higher with the criterion measures than did scores 
on conventional tests of the same length In seven of eight comparisons. 
The 25-ltem pyramids correlated more highly with the criteria than the 50-ltem 
conventional tests. With one exception t the two-stage tests also achieved 
higher correlations with the criterion achievement tests than did conventional 
tests of the same length. Under one scoring procedure the 40-1 tern group 
discrimination two-stage test was laore higihly correlated with outside 
criteria, in four of eight comparisons, than even the 190-1 tem parent 
conventional test. 

Linn et al.'s data permit a comparison of the relative validity of 
their two-stage and pyramidal tests as predictors of the achieveuent test 
criteria. Their data show that, with the exception of the sequential two- 
stage strategy, two-stage tests had higher correlations with the criteria than 
did the pyramidal tests. The ten-item p3rramidal test had the lowest validities ^ 
of all of the adaptive tests, and the validities of the 25-item block branching 
pyramid were about equal to those of the sequential two-stage test. Within 
the two-stage tests, the group discrimination approach had slig^itly higher 
validities than the other two-stage tests. 

These comparisons of the relative validity of the two-stage and pyramidal 
strategies did not take account of the relative numbers of items in the 
different tests. While the two-stage tests were all composed of about 40 items, 
only 10 items were administered in one pyramidal test and 25 in the other. 
Linn ct al. (pp. 142-143) estimated the lengths of conventional tests parallel to 
the 190-ltem parent test which would be necessary to achieve the same validity 
as each of the adaptive tests. When these values were compared to the actual 
adaptive test lengths, an index of "relative saving in test length'* was 
obtained. The group discrimination and three-group sequential methods showed 
the highest ratios, followed by the 25-1 tem and 10-item pyramidal strategies. 
The four-group sequential method showed the lowest ratios. 

Purpose 

Although Linn et al. (1969) used both pyramidal and two-stage tests in 
their study simulating adaptive testing, their major objective was to study 
the relationships between short adaptive and conventional tests and longer 
parent tests or achievement test criteria. The present investigation is one 
of a series of studies designed to further compare adaptive testing strategies 
using other criteria. These studies use actual computer administration of 
adaptive tests to groups of college students. The results of different adaptive 
testing strategies have been compared with those obtained from conventional 
testing approaches (Betz & Weiss, 1973; Larkin i Weiss, 1974) with respect to 

ERiC 11 



-7- 



the accuracy of ability estimation, test-retest stability, internal consistency 
reliabilities, and other psychometric characteristics. In addition, more 
fundamental questions about each strategy are under consideration, including 
the investigation of various item difficulty structures for each of the 
adaptive strategies, problems in determining branching or routing rules, and 
the determination of meaningful and reliable scoring methods for each adaptive 
strategy. 

In this series of studies, all tests, both conventional and adaptive, 
were constructed for administration by computer (DeWitt & Weiss, 1974). 
Testing strategies were administered two at a time so that scores from one 
adaptive strategy could be compared with those from another, and so that scores 
from adaptive and conventional tests could be directly compared. In order to 
determine the stability of scores from each of a number of scoring methods, 
each testee was administered the same test on two occasions with periods 
averaging about six weeks between the initial and final testing. In some 
studies, conventional and adaptive strategies were paired on both test and 
retest and the comparative stabilities of the two strategies were studied. 
Other studies focused on comparisons among the various adaptive strategies* 

The present analysis was undertaken for the purpose of directly comparing 
the psychometric characteristics of scores obtained from a two-stage strategy 
and the pyramidal approach. Previous studies in this series have reported the 
results of analyses of computer-administered two-stage (Betz & Weiss, 1973) and 
pyramidal tests (Larkin & Weiss, 1974) in comparison with conventional tests. 
However, a different group of subjects was used in each of those studies. In 
the present study, the characteristics of scores derived from two-stage and 
pyramidal tests are compared directly using the same group of subjects. 

METHOD 

One set of test data was derived from the administration of e '^wo-stage 
test and a pyramidal test to 111 subjects. 

The .\5-stage pyramidal item structure was composed of 120 items. Each 
testee cc pie ted only fifteen items. The two-stage test required 130 items 
for its construction and each subject answered 40 items. Both tests drew 
items from the same item pool, and eighty items were common to both test 
structures. Although each testee could be administered a maximum of 15 items 
common to both tests, it was also possible that a testee could receive no 
common items. 

In order to detect the presence of the effects of boredom or fatigue, the 
order of presentation was randomized on both testings. Each adaptive test was 
administered first to half the testees and administered second to the remaining 
testees. 

Test Construction 

Ii.em Pool 

The Item pool was composed of 369 five- alternative multiple-choice 
vocabulary questions noraed on college undergraduates (McBrlde & Weiss, 1974). 



ERIC 



112 



-8- 



Uslng estimates of Item difficulty (proportion correct) and dtem discrimination 
(biserial correlation with total score on the norming tests) approximations to 
the normal ogive item parameters a and b (Lord & Novlck, 1968, pp. 376-379) 
were determined using the following formulas: 




where a is the normal ogive index of discrimination 
b is the normal ogive index of difficulty 

is the biserial correlation of item response and total score 

and 4"^ is the inverse of the cumulative normal distribution 
corresponding to the proportion correct. 

Items with biserials lower than .30 were not used in the item pool. The norming 
studies indicated that there wa^ some difficulty-discrimination interaction 
such that the pool contained disproportionately more highly dlscrlminatlns 
items in the lower range of item difficulty. 

Construction of the Two-stage Test 

The two-stage test used in this study was composed of a 10-ltem routing 
test and four 30-ltem measurement tests. This adaptive test was the "Two- 
stage 2" test in a simulation study iu a previous report in this series 
(Betz & Weiss, 1974). 

Routing test . In order to make a good initial assessment of ability and 
tx^ assign testees to measurement tests while minimizing the probability of an 
assignment error, the 10 items in the routing test were selected to have a 
high mean discrimination. As shown In Table 1, mean dlscrimin :tion for the 
routing test was a=.702. The standard deviation of the item discriminations 
was .163. Appendix A, which shows difficulty and discrimination values for 
each item in the routing subtest, indicates that the lowest discrimination 
was a=^SO and the highest was a=.98 . 

The routing subtest was a peaked test of median difficulty items which 
were highly discriminating. The items in the routing subtest had a mean 
difficulty level of ba-.232. Table 1 shows that the standard deviation of the 
item difficulties in the routing test (.50) was very low when compared to those 
of ihe measurement teats. 

After the routing test was completed, an estimate of the testee's ability 
was made in standard units (see Betz & Weiss, 1974, pp. 11-12). Subjects were 
assigned to the measurement test closest in difficulty to their estimated 



ERLC 



13 



-9- 



ab^^llty. Thus, those testees with from 0 to 4 items correct on the routing test 
were assigned to the least difficult of the measurement tests. Those with 
scores of 5^6^ 7-8, and 9-10 were routed to one of the three more difficult 
measurement tests. 



Table 1 



Means and Standard Deviations of Normal Ogive Item 
Parameters for Two-stage and Pyramidal Tests 



No. 



Test 




Items 


Mean 


S.D. 


Mean 


S.D. 


Two-stage 
(all items) 




130 


-.072 


1.251 


.633 


.183 


Routing 




10 


-.232 


.050 


.702 


.163 


Measurement 


1 


30 


1.725 


.558 


.530 


.126 


Measurement 


2 


30 


.350 


.297 


.684 


.214 


Measurement 


3 


30 


-.709 


.189 


.611 


.122 


Measurement 


4 


30 


-1.603 


.373 


.683 


.213 


Pyramid 




120 


-.094 


1.256 


.799 


.457 



Meas urement tests . In selecting Items for each of the four measurement 
tests, the following rationale was used. The quantity a(b^-b^) was computed, 
where b is the mean difficulty of the routing test. The a parameter is the 
mean discrimination for all 130 items in the two-stage structure, i.e., .633; 
b. represents the mean difficulty of the measurement test in question. Betz 
aid Weiss (1974) have shown that to obtain four measurement tests suitable for 
subjects routed to them, the values required for a(b^-b^) were 1.239, .JbtJ, 
-3.02 and -.868. 

Table 1 shows that the average of the discrimination parameters for the 
measurement tests ranged from .530 to .684, and that the^v«"8« 
of discrimiiuition values foi the measurement tests was about the same as 
^veJagr^r^^bility of item dlccriminations in the routing test. f --'^-"-"^ 
tests were not as peaked as the routing test as indicated by the larger 
Ganges and standard deviations of their difficulties. The average f 
oTlhl ^Lurement tests, as shown in Table 1, approximated the desired values. 



ERIC 



-10- 



but iDeasureoent test 1 was sooevhat more difficult than the target value, 
and measurement tests 3 and 4 were somewhat easier. Appendix A gives the 
normal ogive Item parameters for the Items In each measurement test. 

Scoring . The two-stage test was scored by the same method used by Betz 
and Weiss (1973, 1974) who adapted their method from studies by Lord (1971c). 
Essentially, maximum likelihood estimates of ability were obtained from both 
subtests and then weighted and summed. The measurement test was given three 
times the weight of the routing test because there were three times as many 
items in it as in the routing test. 

The formula used to obtain the ability estimates for both subtests 
completed by each testee was: 




is the mean discrimination of the subtest 
is the nuinber correct 



where a . 

^ 

X 



m is the number of items in the subtest 



e is the chance score level 

is the mean difficulty of items in the subtest 

and $ ^ is the inverse of the cumulative normal distribution 
function corresponding to the proportion correct. 



For perfect scores (x=^), 9 could not be determined. Therefore, when x was 

equal to m, it was replaced by a?=m-.5. For scores at or below chance 

(x<am) 9 6 was also indeterminate and x was replaced by x=^cnhh*S. 

The scores of the subtests were combined in the following way: 



^ Qj^^Qg (4) 

where 9 is the combined ability estimate 



6^ is the ability estimate obtained from the routing test 

0^ is the ability estimate obtained from the measurement test. 

This combined ability estimate can be interpreted as a standard normal deviate 
(see Betz & Weiss, 1973, pp. 14-15). 



ERIC 



15 



-11- 



Constructlon of the Pyramidal Test 

Tlie pyramidal test used In this study was Pyramid 3, studied by Larkln & 
Weiss (1974). It was composed of fifteen stages with a constant step size. 
An up-one/down-one branching rule was used. Since n(n+l)/2 Items are needed 
for the construction of an n-stage pyramid, ISdS-hD/B or 120 Items were 
selected from the Item pool. The Initial Item was of median difficulty for 
the testees of the norm group. The step size, that Is the Increment or 
decrement In Item difficulty from one stage to the next, had a mean value of 
b».199, and a standard deviation of «08. 

After establishing the Initial Item difficulty and step size, the available 
Items In the pool were divided Into 29 groups on the basis of difficulty. 
All Items In a group had about the same b value and an a value of at least 
.30. The Items required were selected from each group according to their 
discriminations. The Itema with the highest discriminations In each group 
were selected for use In the pyramidal test. Paterson (1962) has suggested 
that items in a pyramidal test be ordered within each column according to 
discrimination with the most discriminating items appearing first. This 
suggestion was followed in construction of this pyramidal test, as shown in 
Appendix B which gives the normal ogive difficulty and discrimination estimates 
for each item in the pyramidal test. The item difficulties ranged from 
b«-2.86 to b«2.61. The discrimination values varied from a«.4l to a«3.00. 

Appendix B indicates that the initial item, which was presented to all 
testees, had a difficulty of b«-.05. If the subject answered this item 
correctly, he/she was branched to a more difficult item (b«.l4) at stage 2. 
An incorrect response branched the testee to an item easier (b«-.21) than the 
initial item. The branching process continued until each testee had attempted 
15 items 4 

The means and standard deviations for difficulty and discrimination are 
shown in Table 1. The average difficulty of the items in the pyramidal 
structure was b=-.094, with a standard deviation of 1.256. The average 
discrimination of the pyramid items was a=.799. When all items in each 
adaptive test were considered. Table 1 shows that the overall difficulties 
were almost the same. The 120 items in the pyramidal structure and the 130 
items in the two-stage test had very similar means and standard deviations of 
item difficulties. However, the pyramid was composed of more highly dis- 
criminating items and the variance of the item discriminations was much 
higher in the pyramidal test. 

Scoring . In order to compare ability estimates derived from various 
scoring methods, four different methods were used to estimate ability. These 
four methods were among those used in a previous investigation of pyramidal 
testing (Larkln & Weiss, 1974). Method 1 was the number of correct responses. 
This has been the most common scoring method used in other studies. For a 
pyramid of 15 stages, 16 different number correct scores are possible (0 to 
15). Method 2 was the mean difficulty of the items attempted by each testee. 
An approach similar to this involves averaging the difficulties of all items 
but the first (since every testee attempts it) and including a hypothetical 
sixteenth item (Lord, 1970, 1971b). Method 3 averages the difficulties of 
the correctly answered itema only. Under method 4, subjects were scored by 
the difficulty of the final item attempted in the pyramid; since the branching 



16 



ERiC 



-12- 



strategy actually adapts the difficulties of the items to the ability of 
the testee, the difficulty of the final item reached should reflect the 
testee^s ability level (assuming that the pyramidal structure has enough 
stages). Two other scoring methods, the (n+lHh difficulty score and the all- 
item score (Hansen, 1969) were found in previous research (Larkin & Weiss, 
1974) to correlate perfectly with the number correct score and mean diffi- 
culty of all items attempted respectively. Consequently, these two scoring 
methods were not used in the present analyses. 

Test Administration and Subjects 

Cathode ray terminals (CRT's) acoustically coupled to a time-shared 
computer systerm were used to administer both the two-stage and pyramidal 
test (DeWitt & Weiss, 1974). Items were presented one at a time on the CRT 
screen; subjects responded by typing a nuniber corresponding to the correct 
alternative to each multiple-choice item. A total of 55 items (15 from the 
pyramidal test and 40 from the two-stage test) was administered to each 
testee. The order of presentation of the tests was randomized over subjects. 
Fifty-six testees completed the pyramidal test first and the 55 remaining 
testees completed the two-stage test first. Subjects were informed at the 
completion of testing of the total number of items they answered correctly. 

The testees were undergraduates enrolled in general psychology or 
psychological statistics courses at the University of Minnesota. Because this 
combination of adaptive tests was given as the second session of a two-part 
study, all had had previous experience vith computer-administered tests. All 
subjects were given the opportunity to review instructions explaining the 
operation of the CRT's prior to testing. A proctor was available in the 
testing room to begin the testing and to provide further assistance to any 
testee having difficulty with the equipment. No time limit was Imposed. 
Testees were informer that they might take as much time as necessary to finish 
the tests. 

Analysis 

The data analyzed in the present study consisted of five scores, one 
two-stage score and four pyramidal scores, for each testee. 

Order Effects 

The effects of the order of administration on test scores were investigated 
by comparing scores of the testees who received each strategy first with those 
who received that strategy second in the series of two tests. In this manner 
fatigue, practice, or carry-over effects between strategies could be detected. 
Because the scores were expected to be highly correlated a one-way multivariate 
analysis of variance was used with all five scores simultaneously considered 
as dependent variables. 

Characteristics of Score Distribution 

One objective of the present study was to compare the distributions of 
scores on the 40-item two-stage test with those obtained from each method of 
scoring the 15-stage pyramidal test. The appropriateness of the test difficulty, 
the relative variabilities of each snoring procedure, and the shape of the 
obtained score distributions were examined. 

ERLC 1^ 



-13- 



Because different units were used in scoring the tests, the standard 
deviation of each scoring method was divided by the potential range of scores 
under that method. The resulting value is an index of relative variability 
(Betz & Weiss, 1973) • This index shows the effective utilization of the entire 
score range for each scoring method. The range of possible scores on the ^ 
two-stage test was derived using Formulas 3 and 4 to compute estimates of 0 
for perfect and chance scores* This range was 4.66-(-5.30)«9.96. The ranges 
for the pyramidal scoring methods were as follows: 1) The number correct 
range was 15; 2) The range for the mean-diffic it/-at temp ted score was the 
difference between the score made by a testee answering all items correctly 
and the score of one responding incorrectly to all items. This value was 
2.79; 3) The range for the mean-difficulty-correct score was the difference 
between the score of a subject with 15 correct responses and the lowest 
(n+l)th score. The latter value was used since a testee with no items answered 
correctly would have a mean-difficulty-correct score which was undefined. 
This range was 4.42; 4) The final item difficulty range was the difference 
between the easiest and most difficulty terminal items, or 5.48. 

In addition to the mean and variability indices, the skew and kurtosis 
of each distribution was computed and the significance and direction of its 
departure from normality were determined (McNemar, 1969, pp. 25-28, 87-88). 

Relationships between Two-stage and Pyramidal Scores 

To determine the relationships among the pyramidal scores and their 
relationships to two-stage scores, product-moment correlations and correlation 
ratios (eta) were computed. The latter were computed to determine whether the 
relationships between scores on the two strategies were curvilinear. In 
determining the etas, both the regression of two-stage scores on pyramidal 
scores and the regression of pyramidal scores on two-stage scores were 
computed . 

Internal Consistency Reliability 

Data on the reliabilities of the two-stage and pyramidal tests are import- 
ant to provide a point of reference for interpreting the correlations between 
scores on the two adaptive strategies. 

The internal consistency reliability of the two-stage test was determined 
by Hoyt's. (1941) method. This index can be computed only if every subject 
attempts each item on a test. For this reason, the two-stage test had to 
be treated as five separate tests. Reliabilities were computed separately 
for the routing test, using the responses of the total groip of subjects, and 
for each of the four measurement tests, using the responses of those subjects 
routed to each measurement subtest. To compare the internal consistencies of 
the 10-item routing test with that of 30-item measurement tests, the Spearman- 
Brown formula was used to estimate the reliability of a 30-item routing sub- 
test based on the testees* responses to 10 items. 

Because all testees do not answer the same subset of items under the 
pyramidal strategy, its internal consistency reliability cannot be determined 
satisfactorily (see Larkin & Weiss, 1974). Consequently, to make meaningful 
comparisons between the reliabilities of the pyramidal and two-stage tests, 
the test-retest correlations for each strategy determined from two previous 



18 



-14- 



emplrical studies (Betz & Weiss, 1973; Larkln & Weiss, 1974) were used. 
Mls-routlng 

Mis-routing occurs In the two-stage strategy when a testee Is routed to 
measuremcn*: tests of inappropriate difficulty. The following criteria were 
used (see Betz & Weiss, 1973) to determine the proportion of testees who 
were mis-routed. All testees who obtained perfect scores (30) on their 
measurement subtest were considered to have been routed to a test too easy 
'^r them. Those testees with subtest scores at or below chance level 
(i.e., 6 correct) were considered to have been assigned to a measurement test 
too difficult for them. If a testee met either of the two criteria, he/she 
was classified as having been mis-routed by the routing test. 

Intercorrelatlons of Pyramidal Scores 



Product-moment correlations were computed for all pairs of pyramidal 
scoring methods to determine the interrelationships among them. Correlation 
ratios were computed and coiq>ared with the product«nnoment correlations to 
detect the presence of possible curvilinear relationships « 



RESULTS 



Order Effects 

Table 2 shows the means and standard deviations by scoring method and 
strategy for the groups completing p3rramidal or two-stage tests first. 
The one-way multivariate analysis of variance resulted in an F-value of .92 
with an associated probability of .47. Thus the two sets of mean scores 
obtained under the two orders of administration were not significantly 
different. As a result, the data from both order groups were combined for 
all further analyses* 

Table 2 



Means and Standard Deviations for Subgroups 
Completing Pyramidal and Two-Stage Tests in 
Different Orders 



ERIC 



Pyramid First 

(N«56) 
Mean S.D. 



19 



Two-stage First 
(N«55) 
Mean S.D. 



pyramidal Test 










Number Correct 


8.21 


2.55 


7.64 


2.24 


Mean difficulty- 
attempted 


0.10 


0.56 


-0.09 


0.53 


Mean difficulty- 
correct 


-0.02 


0.61 


-0.22 


0.57 


Difficulty of 
final item 


0.17 


0.97 


-0.06 


0.88 


Two-stage Test 


-0.16 


1.39 


-0.50 


1.19 



-15- 



Score Distributions 

Pyramidal test * Descriptive statistics for the pyramidal and two-stage 
test scores are presented In Table 3. The mean number correct score of 7.93 
Indicated that the subject group as a whole answered approximately half the 
15 Items In the pyramid correctly, suggesting that the difficulty of the test 
was appropriate for the ability of the group tested. Ttie two mean difficulty 
scoring methods and the final item difficulty scoring me^.hods all had means 
of about O.O. Since the test was coioposed of items with mean difficulty 
of -.094, this result was expected. These results also suggest that there 
were few items answered correctly as a result of guessing, oa the average, 
since guessing would have resulted in scores above the average, of the norming 
group. 



Table 3 



Descriptive Statistics for Distributions of Scores 
from Pyramidal and Two-stage Tests 
(N - 111) 



Test and 
Scoring Method 


Mean 


Median 


S.D. 


Proportion 
of 
Sange 
Utilized 


Skew 


Kurtosls 


Pyramidal Test 














Number Correct 


7.93 


7.43 


2.41 


.16 


0.58* 


0.08 


Mean difficulty — 
attempted 


0.01 


-0.12 


0.55 


.20 


0.42 


-0.47 


Mean difficulty — 
correct 


-0.12 


-0.23 


0.60 


.14 


0.03 


0.19 


Difficulty of 
final Item 


0.06 


-0.08 


0.93 


.17 


0.44 


-0.20 


Two-stage Test 


-0.33 


-0.54 


1.30 


.13 


0.35 


0.29 



*Statlstlcally significant at p<.05. 



The variabilities for each scoring method are also shown in Table 3. 
The final item difficulty score had a standard deviation of about 1.0, again 
reflecting the characteristics of the standardized b-values. Because of the 
restriction in the range of possible score values resulting from the use of 
averages, the two mean difficulty scores, also computed from b-values, had 
standard deviations only about half as large as the final item difficulty 
scoring method. When variability is expressed as a proportion of each method's 
potential range, as shown in Table 3, the scoring methods are more easily 
compared. The mean-difficulty-correct scoring method utilized the smallest 
proportion of its available range (.14), while the mean-difficulty-attempted 
method used the largest proportion of its range (.20). The number correct 
score and the final item difficulty scores both utilized about the same 
proportion of their range (.16 and .17). 

ErJc ^'-^^ 20 



-16- 



All scoring methods had distributions which were slightly positively 
skewed. The distribution of number correct scores was the most highly skewed, 
and Its skewness was significantly different from zero skew. The difficulty 
of all Items correctly answered showed almost no skew. 

Two score distributions — mean-dlf f Iculty-attempted and difficulty of 
final Item — were platykurtlc, although not significantly so. The number correct 
and mean-dlf flculty-correct distributions were slightly leptokurtlc. The 
flattest distribution was that of the mean-dlf f Iculty-attempted scoring 
method. When both skewness and kurtosls are considered, the mean-dlf flculty- 
correct scores showed least departure from a normal distribution. 

Two-stage test . The two-stage test scores, expressed In standard units, 
had a mean of -0.33 and a standard deviation of 1.30. This mean was slightly 
lower than that observed In the standardized pyramidal scores . The two-stage 
test utilized a smaller proportion of Its possible range (.13) than any method 
of scoring the pyramidal test. 

The distribution of two-stage scores was slightly positively skewed and 
was slightly leptokurtlc, although In neither case was It significantly 
different from a normal distribution. The skewness was comparable to that of 
most methods of scoring the pyramidal test, but the kurtosls Indicated that 
the two-stage score distribution was more peaked than those of the pyramidal 
test. 

Table 4 summarizes the performance of the total group of testees on the 
10-ltem routing test. 



Table 4 

Means and Standard Deviations of Scores on Subtests 
of the Two-Stage Test 



Subtest 



Composite 



ERIC 



Subject Group 



Routing Test Measurement Test Two-stage Score 
(Number Correct) (Number Correct) ^Standard Score) 
N Mean S . D^^ Mean S.D. Mean S.D. 



All Subjects ill 5.58 2.61 
Assigned to 

Measurement test 1 21 9.33 0.48 
Assigned to 

Measurement Test 2 20 7.40 0.50 
Assigned to 

Measurement Test 3 27 5.63 0.49 
Assigned to 

Measurement Test 4 43 2.86 1.15 



18.56 5.04 

17.00 6.01 

17.20 4.43 

18.59 4.87 

19.93 4.66 



-0.33 1.30 

1.48 0.97 

0.22 0.65 

-0.54 0.70 

-1.33 0.81 



-17- 



Also shown are descriptive statistics of scores for the testees assigned to 
each measurement subtest. On the routing test, the mean number of items 
correct over all subjects was 5.58 out of 10 items, suggesting that the routing 
test was peaked at a difficulty appropriate for the group taking the test. 
That is, item difficulties for the group tested averaged about .60 which is 
the expected median difficulty after chance has been taken into account. The 
standard deviation of nuinber correct scores was relatively large (2.61), 
indicating that the routing test was effective in making an initial separation 
of testees according to ability. The mean number correct across all four 
measurement tests (18.56) showed that after testees had been routed into the 
measurement test, they answered slightly more than half the measurement test 
items correctly. For each 30-item measurement test considered separately, 
the mean nuaaber correct varied from 17.00 to 19.93 (or between 57 and 66 
percent correct). These findings imply that the measurement tests were also 
of appropriate difficulty for the groups of testees routed to them. These 
results, however, suggest that there were somewhat more successes due to 
guessing in the two-stage test than in the pyramidal test. 



The variability of scores for each of the four subject groups was rela- 
tively constant in three of the measurement tests; measurement test 1 had a 
slightly larger variability of scores than the other measurement tests. The 
variability in routing test scores for those subjects assigned to the least 
difficulty measurement test (1.15) was larger than th^t for the other groups 
due solely to the specifications of the routing procedure (i.e., a larger 
range of routing scores led to the assignment of testees to measurement test 
4, the least difficult measurement test). 



Relationship between Two-stage and Pyramidal Scores 

Eighty items were common to both the pyramidal and two-stage item pools. 
The ntimber of times a testee was administered the same item twice (once under 
each strategy) ranged from 0 to 13 with a mean of 6.02 and a standard deviation 
of 3.51. The correlations between the two tests are thus likely to be some- 
what inflated due to the tendency of subjects to make the same responses to 
an item in both the two-stage and pyramidal test, and should be Interpreted 
with caution. 

Table 5 shows the results of the regression analysis of the relationship 
between scores on the two-stage test and scores on the pyramidal test. 
Product-moment correlations ranged from .79 for the mean-difficulty-correct 
scoring method to .84 for the nuiTober correct scoring methods Correlation 
ratios ranged from .83 to .88. There was no general tendency toward 
curvilinear relationships. In only one of the regressions was curvilinearity 
significant to the .05 level. Thus, the relationship between scores on the 
two-stage and pyramidal tests is high and primarily linear. 



ERIC 



-18- 



Table 5 

Regression Analysis of Relationship between 
T\fo-8tage and Pyramidal Scores 
(N-111) 



Scoring Method 




Regression of 
Two-stage Score 
on Pyramid Score 


Regression of 
Pyramid Score on 
Two-stage Score 


r 


eta 


pa 


eta 




Number correct 


,84 


.85 


.71 


.88 


.25 


Mean difficulty- 
attempted 


.81 


.86 


.10 


.86 


.04* 


Mean difficulty- 
correct 


.79 


.84 


.23 


.84 


.15 


Difficulty of final 
Item 


.83 


.83 


.56 


.86 


.21 



^Significance of curvilinear Ity 
♦Significant at p<.05 



ERIC 



Internal Consistency Reliability 

Table 6 shows the Internal consistency reliabilities for the two-stage 
subtests. The Internal consistency of the 10-*ltem routing test (.72) was the 
same as that of the least difficulty 30«->ltem measurement test. When number 
of Items was equated for the 10*ltem routing test and the 30-*ltem measurement 
tests 9 the routing test showed the bluest Internal consistency of the five 
subtests. This was likely due to the Intentional restriction in the range of 
abilities of subjects assigned to each measurement test by the routing process. 

Table 6 

Internal Consistency Reliabilities for Subtests of the Two-stage Test 

Number Hoyt Reliability 
Subtest N of Items Coefficient 

Routing m 10 .72 (.89^) 

Measurement 1 21 30 .84 

Measurement 2 20 30 .66 

Measurement 3 27 30 .75 

Measurement 4 43 30 272 

^Estimated reliability for a 30-ltem test. 

23 



-19- 



Whlle this finding might have resulted from differences in item discriminations 
among the subtests, comparison of the data in Table 1 with those in Table 6 
show that the measurement tests with the highest average discriminations had 
the lowest reliabilities. Measurement test 1 (the most difficult measurement 
test) did, however, have a reliability which was almost as high as the 
corrected reliability of the routing test. 

Mis-routing 

In the two-stage test, only one testee in the sample of Til obtained a 
score of 6 or less on the measurement subtest and was thus considered 
mis-routed. A less difficult measurement test would have been more appropriate 
for him/her. No perfect scores were obtained on any measurement subtest. 
The misclassif ication rate was therefore 1/111*.009. 

Intercorrelations of Pyramidal Scores 

The intercorrelations of scores from the four methods of scoring the 
pyramidal test are shown in Table 7. Highest observed correlation (r«.99) 
was between the two mean difficulty scores. Number correct had the lowest 
correlations (r».93) with the two mean difficulty scores. There was no 
curvilinearity in these data since all the corresponding r*s and etas were 
virtually identical. 



Table 7 



Intercorrelations of Scores from 
Pyramidal Scoring Methods 
(N-111) 



Scoring 
Method 



Number 
Correct 



Mean difficulty- 
attempted 



Mean difficulty- 
correct 



Mean difficulty 
attempted 



r 
eta 



.93 
.93 



Mean difficulty* 
correct 
r 
eta 



.93 
.93 



.99 
.99 



Difficulty of 
final item 



r 
eta 



.98 
.98 



.95 
.95 



.95 
.96 



ERiC 



24 



-20- 



DISCUSSION AND CONCLUSIONS 



ERIC 



Score distributions for both the pyramidal and two-stage tests suggested 
that both were of appropriate difficulty for the general ability level of 
the testees. For the pyramidal tests » the mean score was slightly more than 
half of the possible range. Those pyramidal test scores which were expressed 
in standard units had means which were all about zero. The two-stage scores 
also had a near-zero mean. These results were similar to those obtained by 
Larkin and Weiss (1974) and Betz and Weiss (1973) . However, the latter study 
found mean scores for a similar two-stage test to be slightly closer to 
zero (-0.21 at time 1 and -0.02 at time 2) than in the present study (-0.33). 
In the previous investigation of two-stage tests, standard deviations were 
found to be 1.36 and 1.39. In the present study, the standard deviation was 
1.30. 

A "real data" simulation of the same two-stage test used in the present 
investigation (Betz i Weiss, 1974) resulted in a mean score of very near zero 
(-.004) and a standard deviation of 1.05. Thus, real testees obtained a 
lower average score, and were more variable on the two-stage test, than were 
simulated testees. Thess results suggest that there are very few chance 
successes due to guessing in actual administration of two-stage tests, since 
guessing would result in scores above zero, on the average. 

The two-stage test was found to utilize a smaller proportion OL its 
possible score range (.13) than the pyramidal test. This finding is consistent 
with the results of the two previous empirical studies in this series, in 
which two-stage tests and two methods of scoring pyramidal tests used a greater 
proportion of the score range than conventional tests. Betz and Weiss (1973) 
found that, for a similar two-stage test, the proportion of range utilized was 
.23. However, their index was computed by dividing the obtained standard 
deviation by fr (+3 s.d.) rather than the actual possible range of two-stage 
scores, thus inflating the index. The range of possible scores for the two- 
stage test in the present study was 9.96 rather than simply 6. Therefore, the 
proportion of range utilized is lower in the present study because of the 
change in the method of computation. 

Both adaptive tests provided score distributions which were slightly skewed 
in a positive direction, but, with the exception of one scoring method for 
the pyramidal test, the degree of skew was not statistically significant. 
Seeley, Morton and Anderson (1962) obtained a highly negatively skewed 
distribution cf scores on a pyramidal test. Their result » however, was possibly 
due to the easiness of their test and/or to the exclusion of some lower-ability 
examinees who did not carefully follow the instructions. Bayroff and Seeley 's 
(1967) results, however, were more similar to those found in the present 
study; they obtained a normal distribution of pyramidal scores when computer 
administration was employed. Larkin and Weiss (1974) found a tendency toward 
positive skew in two other pyramidal tests similar to the one used here. 

In their previous empirical study of two-stage testing, Betz and Weiss 
(1973) obtained score distributions which also tended toward positive skew but 
were not significantly different from a normal distribution. The two-stage 
simulation (Betz & Weiss, 1974) showed score distributions to have almost 
zero skew (-.04) when administered to a population distributed normally on 
ability. ^t25 



-21- 



There was a slight, but non-slgnlf leant , trend for most pyramidal score 
distributions to be platykurtlc. The tendency toward flatness In score 
distributions from pyramidal tests has been noted by Hansen (1969) who obtained 
a rectangular score distribution. Two similar pyramidal tests of Larkln and 
Weiss (1974) were significantly flat* The two-stage score distribution In 
the present study was slightly (and non-slgnlflcantly) leptokurtlc, Betz and 
Weiss (1973), however, found that a similar two-stage test produced a slightly 
flattened distribution of scores. With simulated data, Betz and Weiss (1974) 
found that score distributions on the same two-stage test used In the present 
study were significantly flat (p<.01)^ but less platykurtlc than distributions 
of scores for another two-stage test and a conventional test. Results of the 
present study, however, showed that the mean-dlfflculty-correct score derived 
from the pyramidal test gave results which were least deviant from a normal 
distribution In comparison to other pyramidal and two-stage test scores. 

The distributions of scores within the two-stage measurement subtests 
represented an Improvement over those obtained In the previous study of two- 
stage testing. First, the nuniber of testees assigned to each measurement test 
was more nearly equal. Betz and Weiss (1973) found that approximately half 
of the subjects completing their two-stage test were routed to the most diffi- 
cult measurement test. Further, the easier measurement tests In the previous 
study were found to be too easy for the testees routed to them. The more even 
distribution of testees routed to each measurement test in the present 
investigation can be attributed to the more appropriate difficulty of the 
routing test and to the revised procedure used to determine cutting scores for 
assignment to measurement tests. The improvement in the score distributions 
within the measurement tests is due to modifications making the more difficult 
measurement tests easier, and the less difficult measurement tests more 
difficult. 

The misclassificatioii rate for the two-stage test in this study was .009 
using the same criteria as those used by Betz and Weiss (1973), i.e., perfect 
scores (30) or chance scores (or less) on the measurement tests. This compared 
favorably with the 5% misclassif ication rate in Betz and Weiss (1973). 
The 20% rates obtained by Ai»goff and Huddleston (1958) and by Cleary, et al. 
(1969a,b; Linn, et al., 1969) were due primarily to the different misclassi- 
f ication criteria in their real-data simulation studies. The low rate of mis- 
classifications in the present study may be accounted for by (1) the more 
accurate assignment of subjects to measurement tests brought about by revisions 
in the routing tests, (2) the maximum likelihood procedure used for classifi- 
cation, (3) the increased cutting scores (no testee was routed to a measurement 
test in which he/she obtained a perfect score) and (4) the more appropriate 
difficulties of the items used in the measurement tests. 

The internal consistency reliabilities of the two-stage subtests also 
reflect the improvements in the difficulties of those subtests. For the 
routing test and three of the four measurement tests, measures of internal 
consistency were as much as .31 higher than the corresponding reliabilities 
found by Betz and Weiss (1973). This finding suggests that the difficulties 
of the measurement test items were more appropriate (i.e., approximating p».5) 
for the groups of subjects attempting them. The increased difficulty of the 
routing test items in the present study as compared to the previous empirical 
study resulted in routing test scores which had a standard deviation more 
than twice that found in the previous study. The changes made in the 

ERiC -'26 



-22- 



measurenent tests » by decreasing the number of items which were much too 
easy or too difficult for the group routed to them> enabled the interitem 
correlations and thus the internal consistency reliability coefficient to 
increase. 

The correlations between scores on the pyramidal and two-stage tests 
obtained in this study ranged from r«.79 to .84 (eta** 83 to •88)* 7i*he two 
previous empirical studies In this series found correlations of r».32 to .89 
(eta* .84 to .92) between scores on the pyramidal and conventional testing 
strategies, and r».80 to .84 (eta».82 to .88) between scores on th^i two-stage 
and conventional tests. In the simulation study, Betz and Weiss (1974) found 
a correlation of r».82 (eta».82) between scores on the two-stage and conventional 
tests. Thus, it appears that scores on the two-stage test are aliriost as 
highly related to scores on a 15-stage pyramidal test as they are to scores 
on a 40-item conventional test. The relationship between scores an the two 
adaptive tests id almost as high as that between the pyramidal and conventional 
tests. In the two previous empirical studies, the items contained in the 
adaptive and conventional tests were non-overlnpplng. The present study, 
however, permitted some of the same items to be administered in both the 
pyramidal and two-stage tests, which may have somewhat inflated the correlation 
between them. An average of six items — or 40Z of the pyramidal test's items — 
were the same in both tests. 

The correlation between scores on the two adaptive strategies approached 
their empirical stabilities. The seven-week test-retest stability of the 
pyramidal test used In this study ranged from r».82 to .86 (eta". 85 to .90) 
depending on the scoring method used (Larkin & Weiss, 1974). The stability of 
scores of a two-stage test similar to the one used here was r».88 (Betz & 
Weiss, 1973). Using the Pearson coefficients, the correlation between the 
two-stage and pyramidal tests accounted for 62% to 71Z of the common variance. 
Stability of the adaptive tests showed that from 67% to 74% of the pyramidal 
test's variance was reliable while about 77% of the variance of the two-stage 
test was reliable. Thus, assuming that error variance is uncorrelated, 
from 42% to 53% of the reliable variance in the pyramidal test was common to 
the two-stage test, while from 48% to 55% of the reliable variance In the two- 
stage test was common to the pyramidal test. Further, the correlation between 
the two adaptive tests equalled or exceeded the internal consistency reliabilities 
of all the measurement tests and approached the internal consistency of the 
routing test when corrected for length. 

Several tenative conclusions can be drawn from these results. First, 
the results replicate previous findings which indicate that the order of 
administration of adaptive tests does not significantly affect scores on the 
tests. Consequently, research on different* adaptive strategies can proceed 
by administering two or more strategies successively to an individual without 
randomizing administration order. 

The results seem to support previous findings by Lord (1970) and Larkin 
and Weiss (1974) which indicate that the average difficulty scores are the 
most useful way of scoring pyramidal tests. Lord's results Indicate that hi? 
average difficulty score provides the most desirable information functions 
while Larkin and Weiss* results indicate that these scores are the most stable 
over short time intervals. And, in the present study, the mean-difficulty- 
correct score gave results which deviated least from a normal distribution. 

ERIC • 27 



-•23- 



Al though the distribution of ability In the subjects was unknown, this agree- 
ment of results across these studies Implies that It Is not unreasonable to 
assume that It was normal. Further research Is needed, however, with 
populations of known distribution of ability, to support this assumption. 

The data on score means for the two adaptive strategies suggest that few 
chance successes occurred, on the average, as the result .of guessing. These 
results support Hansen* s (1969) finding that decreases In guesrlng do occur 
when Item difficulties are adapted to each Individual's ability level. There 
was a suggestion In the data that the pyramidal strategy appeared to result in 
fewer chance successes due to guessing than did the two-stage t^trategy. This 
finding should also be further studied by research designed specifically to 
answer that question* 

Finally, the results suggest that the two adaptive strategies are not 
replacements for each other in terms of measuring the same variable in the 
same way. When the correlation between scores on the two adaptive strategies 
was considered with respect to available data on the rellabxlltles of the 
strategies, only about 50Z of the reliable variance of the two strategies 
was found to be common. Thus, each strategy orders individuals differently 
on estimated ability. Further research is needed to determine the reasons 
for these different ability estimates. 

Thus, a deficiency of the present study concerns the determination of the 
relative efficiency of the two testing strategies. The use of live subjects 
does not permit any estimation of the precision or accuracy of the scores 
obtained under either strategy, since the "true" ability of the testees was, 
of course, unknown. Thus, the degree to \Alch test scores accurately reflected 
underlying ability could not be determined. Live-testing empirical studies 
designed to answer this question will require very large samples of testees. 
Theoretical studies, as shown by Weiss and Betz (1974), appear to provide 
results which are not generalizeable beyond those conditions satisfying their 
restrictive assumptions. Thus, additional simulation studies (e.g., Betz & 
Weiss, 1974) seem to be necessary to determine which adaptive tests scored by 
which method provide most accurate measurement for testees of various ability 
levels. The simulation studies should then be followed by live-testing 
studies to validate the simulation findings. 



ERLC 



28 



-24- 



References 



Angoff, W.H. & Huddleston, E.M* The multi-level experiment: a study of a 
two-level test system for the College Board Scholastic Aptitude Test. 
Princeton, New Jersey: Educational Testing Service, Statistical Report 
SR-58-21, 1958. 

Bayroff , A.G. Psychometric problems with branching tests. Paper presented 
at the meeting of the American Psychological Association, Division 5, 
September, 1969. 

Bayroff , A.G. & Seeley, L.C. An exploratory study of branching tests. 

U.S. Army Behavioral Science Research Laboratory, Technical Research 
Note 188, June, 1967. 

Bayroff, A.G., Thomas, J.J. & Anderson, A.A. Construction of an experimental 

sequential item test. Research memorandum 60-1, Personnel Research Branch, 
Department of the Army, January, 1960. 

Betz, N.E. & Weiss, D.J. An empirical study of computer-administered two-stage 
ability testing. Research Report 73-4, Psychometric Methods Program, 
Department of Psychology, University of Minnesota, October, 1973. 
(AD 768993) 

Betz, N.E. & Weiss, D.J. Simulation studies of two-stage ability testing. 
Research Report 74-4, Psychometric Methods Program, Department of 
Psychology, University of Minnesota, Minneapolis, 1974. (AD A001230) 

Cleary, T.A., Linn, R.L. & Rock, D.A. An exploratory study of programmed tests. 
Educational and Psychological Measurement s 1968, 28, 345-360. (a) 

Cleary, T.A. , Linn, R.L. & Rock, D.A. Reproduction of total test score through 
the use of sequential program»!!ad tests. Journal of Educational Measurement s 
1968, 1 , 183-187. (b) 

DeWitt, L.J. & Weiss, D.J. A computer software system for adaptive ability 
measurement. Research Report 74-1, Psychometric Methods Program, 
Department of Psychology, University of Minnesota, Minneapolis, 1974. 
(AD 773961) 

Hansen, D.N. An investigation of computer-based science testing. In R.C. 
Atkinson and H.A. Wilson (eds.). Computer-assisted instruction: a book 
of readings . New York: Academic Press, 1969. 

Hoyt, C.J. Test reliability estimated by analysis of variance. Psychometrika , 
1941, 3, 153-160. 

Larkin, K.C. & Weiss, D.J. An empirical investigation of computer-administered 
pyramidal ability testing. Research Report 74-3, Psychometric Methods 
Program, Department of Psychology, University of Minnesota, Minneapolis, 
1974. (AD 783553) 



ER.IC 



£29 



-25- 



Llnn, R.L.,, Rock, D.A. & Cleary, T.A. The development and evaluation of 
several programmed testing methods. Educational and Psychological 
Measurement , 1969, 29», 129-146. 

Lord, F.M. Some test theory for tailored testing. In W.H. Holtzman (ed.). 
Computer-assisted instruction^ testing, and guidance . New York: Harper 
and Row, 1970. 

Lord, F.M. Robins-Monro procedures for tailored testing. Educational and 
Psychological Measurement , 1971, 31, 3-31. (a) 

Lord, F.M. Tailored testing, an application of stochastic approximation. 

Journal of the American Statistical Association , 1971, 66, 707-711. (b) 

Lord, F.M. A theoretical study of Two-stage testing. Psychometrika , 1971, 
36, 227-241. (c) 

Lord, F.M. & Novick, M.R. Statistical theories of mental test scores . 
Reading, Mass.: Addition-Wesley, 1968. 

McBride, J.R. & Weiss, D.J. A word knowledge item pool for adaptive ability 
measurement. Research Report 74-2, Psychometric Methods Program, 
Department of Psychology, University of Minnesota, Minneapolis, 1974. 
(AD 781894) 

McNemar, Q. Psychological statistics (4th ed.). New York: Wiley, 1969. 

Paterson, J.J. An evaluation of the sequential method of psychological 

testing Unpublished doctoral dissertation, Michigan State University, 
1962. 

Seeley, L.C., Morton, M.A. & Anderson, A. A. Exploratory study of a sequential 
item test. U.S. Army Personnel Research Office, Technical Research Note 
129, 1962. 

Waters, C.W. Preliminary evaluation of simulated branching tests. U.S. Army 
Personnel Research Office, Technical Research Note 140, 1964. 

Waters, C.W. & Bayroff , A.G. A comparison of computer-simulated conventional 
and branching tests. Educational and Psychological Measurement , 1971, 
31, 125-136. 

Weiss, D.J. The stratified adaptive computerized ability test. Research 
Report 73- 3, Psychometric Methods Program, Department of Psychology, 
University of Minnesota, Minneapolis, 1973. (AD 768376) 

Weiss, D.J. Strategies of Adaptive ability measurement. Research Report 74-5, 
Psychometric Methods Program, Department of Psychology, University of 
Minnesota, Minneapolis, 1974. 

Weiss, D.J. & Betz, N.E. Ability measurement: conventional or adaptive? 

Research Report 73-1, Psychometric Methods Program, Department of Psychology, 
University of Minnesota, Minneapolis, 1973. (AD 757788) 



ERLC 



.30 



-26- 



OS 00 

^ 0) 

H 

6 

C 0) 

•H 00 

4J OS 

CO 4J 

U 

Q w 

C O 
OS 

CO 

^ M 

^ 0) 



9 M 



CO 



O 



U 

^ M 
8 Q) Q) 
Q) U 



H 
C3 



9 



U 

I M 



f a» CM 0> CM CM r^<y\rHOCMf*^rHCOlor*^<T\vOCMsO<ta\<*^rHf^CMCMOvO^ 
r^^*^'00\O^^O\^O\^sOf^CMf^00iOf^iONO^00sOf^f^OO\iONO^ 



ior^ONOcocnHOcMHQor^cMOOOHOcoo\OH^iocnio^cMOcn\o 

rHmOOOfnOOr^HHCMO^^O^fnr^^OOvOCMPOr^vOlOfO^CMOOlO 



•^^cMONio rHoor^^voon^romvor^r^cMcorHaNONcooor^vooaN hco 
oo^^OH **ooor^^H^oocncn^cnr^^oo^ooocMoo^o\ioH^ 

CM \OHiOHH CM CM vO CM sO CMHsOHsOH CM HOtcOvO 



o^oo^^-r^oor^io^cMHsoHsoooiONOr^r^H^cnr^coiovoooco^so 



mcMCMinr^f^NOPOcncnooin^^oosoOPor^H^.. 
^asoooot^^ocoiooovocnu^oiof^^vorHcaaNfnaNCMOioooioro 

iH CM ro CM CM iH CM iHcnCMrHCMi-lCMi-lvOCMCM ' CO CM CO lOfHlO 



o\i^ONOOr^r^NO^fn^^sd'cocncneMeM<MHHHOOOHHHHH 



I t 



vOiHO^OHvOiOrHsOsOiOiHOCMi-I^OfOCOunvOvOfOOOaNiOCMvOf^ 
O CM vO iHr0f0iOHiO0\\0'^H^Or^\0\0H00\000^C0s0iniH^s0^ 
rOrOvOrHvOCMvOCMsOCMsOf*>rHfOfOCMCM iH CM CM CO iH vO lO CO CO CM CO 



^^C30cocMr^cMa^ocMcooooo^cMoolo<rOl^^a^r^cM<rrM^ocMooOcM 
inNO^^^io^^^^^coin^NOin^co\oin^inr^\oinmino\r^in 



rHcor^cMr^co"^^rHcof^cMaNa\a\^oooocoo^oor^voOrHf^coaNOO 

COO^vOvOakCOOcOiHOcOf^f^^lOOOiHCO^r^fOOrHCOrHrHf^f^aN 



OOvDaNOOf^CM^vOf^^OiOf"4COa\a\iOOCM<MOOCOO^OOOCMf^^aN 
CMvOOaNCMvOOOCOOxf^OO^OOf^iHiOrHvOiOiOf^vOCMf^^OOvOfOaNaN 
COiHCOCMvDvOCOCOCMCMHCMCOC^ICOCOiHCOsOrHCOCMiHiHiHCMi-ICOCMCM 



00 H 
NO CM 



O 

so CO 



r-^ r-^ HHHC^HCMHHHHHHHHHCMHH-HHHHCMHCM 
I I I t I I I t I I I I I I t { t t I I 



H CM 
SO H 



iH 0\ 



tllltlltlHIIttlttltlltttlilliHt 



H^Hr^OHr^ooooioin • 
HcMo\coo\cMOinooinco too 



0) • 



00 H 
so CM 



m o 

CO CO 



9 Q 



CO CO 



CM vC 



CMCMCMCMCMiHCMCMCMCMCMCMrHiHrHrHrHCMrHrHi-li-li-li-li-li-lrH 



9 U 
0) • 



sOOOCMHiHsOCOOsOOO 

ooinvosoooo9voinino\ 



O vO 
1^ rH 



sOOOOOOC^iHOOCOCMCO CO lO 

CMCOCMCMCMCMiHCMCMiH CM O 
• ••••••••• 

t I t i i I I t i i t 



iHrHOCMONOx^OaNCM S 

sosor^ioo^cM^msot^ o) • 

HSOSO mcOH ifOCM 5CC0 



31 



■27- 



« 

d 

0) 



*ri (0 

o 

a w 

d o 

« »« 

JQ U 

^ 0) 



r-f C8 
O CO 



H «^ 
\0 ^ 



O ^ 



in ^ 



0> irt 
00 «o 



CO 00 ^ 

o\ 00 ^ 



CO 00 



CO o* 



H >0 

eg «o 



in 



in *— 
O ^ 



I I H 



CO «o 



00 
O 



in ^ 

CNi oo 



CM NO 



in oo 















as 












in 


















r*. in 












to 
















• 


• 






• 


• 




• 


• 
















H 








H 






H 


















CM 






00 






o 


in 




o 














>o 




in 


in 








in 
















H 






H 








H 




H 










CO 






O CM 






00 






H ^ 










CO 






CO in 






CO in 




CO 












H 






H 








H 






H 




CM 












vO 






H 


>o 






r> 


CM 


H 




o 








H 






H 


in 






H 


in 


o 






H 








H 






H 








H 




H 










r> 






00 


oo 






CO 












in 






C7> 








m 






00 


m 






m 






































in 








C7> 






C7> 


m 






10 


00 


o> 


in 




\o 














r> 


>o 






NO 








































in 






vO 








oo 








>o 




00 


CM 




in 






















CM 






CM 






































o 


00 












in 








CO 


m 


r*. 






CO 










>o 






>o 






«0 


in 


CO 


in 


































>o 






r*. 
































H 


00 




H 


t>. 






H 






O 








































00 


to 






00 














r*. 










H 


>o 






O 


in 




O 








H 




O 










































o> 


in 




O 


oo 






00 


CM 




00 


»^ 




oo 






CNJ 




























































r*. 














so 








00 




CO 


O 




in 














in 












in 





































o ^ 
1^ oo 



CO CM 



00 *o 

\0 oo 



in CM 



<j- in 



CO in 
vO ^ 



t I 
iTj in H H tx 

00 tx 0> »0 00 



^ in 



in 



00 ^ 



00 tr» 
o ^ 



O O* 



O oo 



o 



CO in 


CO 




eg tn 


CO 




H 


H 





CO *=» 



CO «o 



CO oo 



in t> 
in t>. 



CO ^ 

^ tx 



00 >o 
so ^ 



IX CM 

vO o 



H ^ 
IS, 



vo cn 
so o» 



00 ^ 



a\ CM 



CO 








O 


H 




H 


in 


CM »A 


Cjl 




CM 




CM 




CM 


CM 


CM 


tx ' 




CM 


in 


CM 















CM 
I 



CM ^ 



5 o 

CM 
I 



CM «0 



CM O 
IX C» 



CM CO 



\0 
00 



CM 



0) 
00 
CO 



ERIC 



32 



OS 
H 

00 



DISTRIBUTION LIST 



Navy 

A Dr. Marshall J. Fai-r, Director 

Pcroonnel and Training Rcaoarch Programs 
Office of Naval Research (Code J^^B) 
Arlington, VA 22217 

1 OUR Branch Office 
A95 Smmner Street 
Boston, MA 02210 
ATTN: J. Lester 

1 ONR Branch Office 
1030 East Green Street 
Pasadena, OA 91101 
AITfl: E.E. Glove 

1 Oy,li Branch OlTice 

53^ 3.,uth Clai'k Street 
Chicago, IL 60605 
ATTN: M.A. Bsrtin 

6 Director * * 

Kaval Research Laboratory 
Code 2627 

Washington, DC 20390 

12 Defon^o Docurientation Center 
C;- -ron Stition, Fielding 5 
5CJ0 :\ko Stroc-'t 

>ia, VA 22314 

1 Special Assistant for Manpower 
OASN (mRA) 
Pentagon, Room 4E79i^ 
Washington, PC 20350 

1 LCDR Charles J. Theisen, Jr., MSG, US 
4024 

Nival Air Development Center 
Warminster, PA 18974 

1 Chief of Naval Reserve 
^ Code 3055 

New Orleans, LA 70146 

1 Dr. Leo Miller 

Naval Air Systems Command 
AIR-a3E 

V/ashingbon, DC 20361 

1 CA?r John F. Riley, USN 
CoTTsranding Officer 
U»S* Naval Amphibious School 
Coronadc, CA 92155 

1 Chief 

Bureau of Medicine & Surgery 
Research Division (Code 713) 
Washington, DC 20372 

1 Chairman 

Behavioral Science Department 
Naval Command & Management Division 
U.S. Naval Academy 
Luce Hall 

Annapolis, MD 21402 

1 Chief of Naval Education & Training 
Q Naval Air Station 
If- Pencacola, PL 32508 

ATTN: CAPT Bruce Stone, USN 



1 Mr. Arnold Rubinstein 

Naval Material Command (NAVMAT 03424) 
Room 820, Crystal Plaza ffS 
Washington, DC 20360 

1 Commanding Officer 

Kavfl L hen iced Nenropsychietric 

Research Unit 
San Diego, CA 92152 

1 Director, Navy Occupational Task 

Analysis Program (NOTAP) 
Navy Personnel Program Support 

Activity 
Buildir^ 1304, Boiling AFB 
Washington, DC 20336 

1 Dr. Richard J. Niehaus 

Office of Civilian Manpower Management 
Code 06(1 

Washington, DC 20390 

1 Department of the Navy 

Office of Civilian Manpower Management 
Code 263 

VJashington, DC 20390 

1 

Chief of Naval Operations (0P-937E) 
Department of the Navy 
Washington, X 20350 

1 Superintendent 

Naval Postgraduate School 
Monterey, CA 93940 
ATTN: Librazy (Code 2124) 

1 Cocoiander, Navy Recruiting Command 
4015 Wilson Boulevard 
Arlington, VA 22203 
ATTN: Code 015 

1 Mr. George N* Graine 

Naval Ship Systems Command 
SHIPS 047C12 
Washington, DC 20362 

1 Chief of Naval Technical Training 
Naval Air Station Memphis (75) 
Millington, TN 38054 
ATTN: Dr. Norman J. Kerr 

1 Dr. William L. Maloy 

Principal Civilian Advisor 
for Education & Training 
Naval Training Cornand, Code OlA 
Pensacola, FL 32508 

1 Dr. Alfred F. Snode, Staff Consult-^r 
Training Analysis & Evaluation Ore . 
Naval Training Equipment Center 
Code N-OOT 
Orlando, FL 32813 

1 Dr. Hanns H. Wolff 

Technical Director (Code N-2) 
Naval Training Equipnent Center 
Orlando, FL 32813 



33 



Chief of Naval Training Support 
Code N-21 
Duildlne 45 
Naval Air Station 
Pensacola, FL 32508 



Air Force 

1 Research Branch 
AP/DW/AR 
Randolph AFB, TX 



781/,8 



Navy Personnel ii&b Center 
San Diego, CA 92152 

Kavy Personnel m) Center 
San Diego, CA 92152 
ATTN: Code 10 



Headquarters 

U.S. Amy Administration Center 
Perjonnel Administration Combat 

Development Activity 
ATCP-IIRO 

Ft. Bonjiucin Harrison, IN i^62A9 

Armed Forces Staff College 
Norfolk, VA 23511 
ATTNs Library 

Cocasandant I 
United States Army Infantry School 
ATTN: ATSH-DST | 
Fort Benning, GA 31905 

Deputy Commander 

U.S. Anny Institute of Administration 
Fort Benjamin Harrison, IN i^6216 



Dr. G.A. Eckstrand (AFHRL/AS) 
Wright-Patterson AFB 
Ohio A5i33 

Dr. Ross^. tWrgan (AFHRL/AST) 
Wright-Pa^J^son AFB 
Ohio 45453^ 

AFHRL/DOJN 
Stop #63 

Lackland AFB, TX 78236 

Dr. Robert A. Bottcnberg (AFHRL/^) 
Stop #63 j 
Lackland AFB, TX 78236 I 

Dr. Martin Rockvay (AFHRL/TT) 
Lowiy AFB 
Colorado 80230 

Major P.J. DcLeo 
Instructional Technology Branch 
AF Human Resources Laboratory 
Lovry AFB, CO 80230 

AFOSIV^NL 

LVOO Wilson Boulevard 
Arlington, VA 22209 

CcmtDandant 

USAF School of Aerospace Medicine 
Aeromodical Library (SUL-^) 
Brooks AFB, TX 78235 



ATTN: EA 

Dr. Stanley L. Cohen 
U.S. Ai^' Research Institute 
1300 Wilson Boulevard 
Arlington, VA 222C9 

Dr. Ralph Dusek 
U.S. Arry Resca^'ch Institute 
1300 Wilson Boulevard 
Arlington, VA 22209 

Kr. Edmund F. Fuchs 
U.S. Arny Research Institute 
1300 Wilson Houlevard 
Arlington, VA 

1 Dr. J.E. Uhlaner, Technical Director 
U.S. Army Research Institute 
1300 Wilson Bouleviird 
Arlington, VA 22209 

Dr. Joseph Ward 
U#S. Army Research Institute 
1300 Wilson Boulevard 
Arlington, VA 22209 

1 HQ USAREUR & 7th Anny 
ODCSOPS 

USAREUR Director of GED 
APO New York 09A03 



1 Mr. B.A. Dover 

KanpoKer Jloaoiu-cmcnt Unit (Code MPI) j 
Arlington Annex, Room 2A13 
Arlington, VA 20330 

1 Commandant of the Marino Corpy 
Headquax'ters , U.S. Marine Corps 
Code MPI-20 

Washington, DC 20380 

* 

1 Director, Office of Manpower Utilicu 
Headquarters, Marine Corps (dbde Mn'* 
MCB (Building 2009) 
Quantico, VA 22134 

1 Dr. A.L. Slafkosky 

Scientific Advisor (Code RD-l) 
Hottdtjvartcrs, U.S. Marino Corps 
Washington, DC 20380 

. ^* 
Coast Gi;ard 

1 Mr. Joseph J. Covan, Chi«f 

rsycuOlOftictil rin=,u«rc}» Dranch (G-P-' 
U.S. Coast Guard Headquarters 
Washington, DC 20590 vAll) 



1 Lt. Col. Henry L. Taylor, USAF 

Military Assistant for Human Resources 
OAD (FALS) ODPIi&E 
Pentagon, Room 3D129 
Washington, DC 20301 

1 Col. /.ustinW; Kdblcr 

Advanced Research Projects Agency 
Human Resources Research Office 
1400 Wilson Boulevard 
Arllnston, VA 22209 

Other Government 

1 Dr. Lorraine D. E^do 

Personnel Research and Dovolopaent 
. Center 

U.S. Civil Service Commission 
1900 E. Street, N.W. 
Washington, DC 20415 

1 Dr. William C-orham, Director 

Personnel Research and Dovelopnent 
Center 

U.S. Civil Service Commission 
1900 E. Street, N.W. 
Washington, DC 20a5 

1 Dr. Vem Urry 

Personnel Research and Developnent 
Center 

U.S. Civil Service Commission 
1900 E. Street, N.W. 
Washington, DC 20a5 

1 Dr. Eric McWilllams, Program Manager 
Technology and Systems, HE 
National Science Foundation 
Washington, DC 20550 

1 ttp. Andrew ft. HolnRr 

TecKtiological Innovations in FduCiti 
National Science Foundation 
Washington, DC 20550 

1 U.S. Civil Service Commission 
Federal Office Bldg. 
Chicago Regional Staff Div. 
Attn: C. S. 'aniewicz 
Regional Psychologist 
230 So. Dearborn St. 
Chicago, IL 60604 



Miscell;Mi^i_» K> 

1 Dr. Scarvia B. Anderson 
Educational TcstJng Service 
17 Executive Park Drive, N.E. 
Atlanta, GA 30329 



Dr. John Annett 
The Open University 
Milton Keynes 
Buckinghamshire 
ENGLAND 



ERLC 



BEST COPY AVAIIABU 34 



1 Dr. Richard C. Atkinson 
Stanford University 
Department of Psychology 
Stanford, CA 94305 

1 Dr. Gerald V. Barrett 
University of Akron 
Deptxrtment of Psychology 
Aki-on, OH U325 

1 Dr. Bernard M. Bass 
University of Rochester 
Management Research Center 
Rochester, NY U627 

1 Mr. Kenneth M. Bromberg 
Manager - Washington Oporatioi 
Information Concepts, Inc. 
1701 North Fort Kyer Drive 
Arlington, VA 22209 



1 Dr. Robert Glaser, Director 
University of Pittsburgh 
Learning Research & Developncnt Center 
Pittsburgh, PA. 15213 

1 Mr. Harry H. Hannan 

Educational Testing Service 
Princeton, NJ 085AO 

1 Dr. Richard S. Hatch 

Decision Systems Associates, Inc. 
rU28 Rockville Pike 
Rockville, MD 20852 

1 Dr. M.D. Havron 

Human Sciences Research, Inc. 
7710 Old Spring House Road 
West Gate Industrial Park 
McLean, VA 22101 



1 Mr. Luigi Potrullo 

2A31 North Edgewood Street 
Arlington, VA 22207 

1 Dr. Diane M. Ramscy-Klee 
R-K Resotirch & System Design 
3947 Ridgemont Drive 
Malibu, CA 90265 

1 Dr. Joseph W. Rigney 

University of Southern California 
■ Behavioral Technology Laboratories 
3717 South Grand 
Los Angeles, CA 90007 

1 Dr. Leonard L. Rosenbaum, Chairman 
Montgomery College 
Department of Psychology 
Rockville, MD 20850 

1 Dr. George £. Rowland 
Rowland and Company, Inc. 
P.O. Box 61 

KaddonTlGld, VJ 03033 

■ 1 Dr. Arthur I. Siegel 

Applied Psychological Services 
AOA East Lancaster Avenue 
Wayne, PA 19087 

1 Dr. C. Harold Stone 
1428 Virginia Avenue 
Qlondale, CA 91202 

1 Mr. Dennis J. Sullivan 
725 Benson Way 
Thousand Oaks, CA 91360 

1 Dr. Benton J. Underwoc-d 
Northwestern 'Iniversity 
Department of Psychology 
Evanston, IL 60201 

1 Carl R. Vest, Ph.D. 
Battolle 

Memorial Institute 
Washington Operations 
2030 - M Street, N.W. 
Washington, D.C. 20036 

1 Dr. Roger A. Kaufman 

United States International Univ. 
(h'aduate School of Leadership and 

Human Behavior 
El3iott Compus 
8655 ^. Pxioriida Head 
San Diego, CA 92124 

X Dr. Anita West 

Denver Rcr.oarch Institute 
University of Denver 
Denver, CO 80210 



1 Dr. Kenneth E. Clark 
University of Rochester 
College of Arts & Sciences 
Hiver Campus Station 
Rochester, NY 14^27 

1 Dr. aonald P. Carver 
School of BHucation 
Univer.rity of Miaaourl - Kansas 
Kansas City, Missouri 64110 



1 Gentry Research Corporation 
4113 Lee Highway 
Arlington, VA 22207 

1 Dr. Rene' V. Dawis 

University of Minnesota 
Department of Psychology 
Minneapolis, MN 55455 

1 Dr. Norman R. Dixon 
Room 170 

190 Lothrop Street 
Pittsbm-gh, PA 15260 

1 Dr. Robert Dubin 

University of California 
Graduate School of Administration 
Irvine, CA 92664 

1 Dr. K.trviu D. Dinin'^tte 
University of Minnesota 
Dopnrtnont of Psychology 
Minneapolis, MN 55455 

1 ERIC 

Processing and Reference Facility 
4833 Rugby Avnue 
Botheada, MD 20014 

1 Dr. Victor Fields 
Mont^^omory College 
Department of Psychology 
Rockville, MD 20850 

1 Dr. Edwin A. Fleislunon 

American Institutes for Research 
FoyJiall Squai-e 
^^01 flew Moxico Avenue, N.W. 
Washington, DC 20016 



ERLC 



1 HumRRO 

Division No. 3 
P.O. Box 5787 

Presidio cf Monterey, CA 93940 

1 HumRRO 

Division No. 6, Library 
P.O. Box i;28 
Fort Rucker, IL 3636O 

1 Dr. Lawrence B. Johr*aon 

Lawrence Johnson & Associates, Inc. 
200 S. Street, N.W., Suite 502 
Washington, DC 20009 

1 Dr. Milton S. Katz 
MITRE Corporation 
Vestgate Research Center 
McLean, VA 22101 

1 Dr. Steven W. Keele 
University of Oregon 
Department of Psychology 
Eugene, OR 97403 

1 Dr. David Klahr 

Cnrncgic-Mellon University 
Department of Psychology 
Pittsburgh, PA 15213 

1 Dr. Frederick M* Lord 

Educational Testing Service 
Princeton, NJ 08540 

1 Dr. Ernest J. McCormick 
Purdue University 

Department of Psychological Sciences 
Lnfayette, JN /,7907 

1 Dr. Robert R. Mackie 

Human Factors Re'-.carch, Inc. 
6780 Cortona Drive 
Santa B^irbara Research Park 
Goleta, CA 93017 

1 Mr. Edmond Marks 
.',05 Old Main 

Pennsylvania St3te Uiiivcrsity 
University Park, PA 36802 

1 Dr. Leo Mimday, Vice-President 
Acfjerican College Testing Program 
P.O. Box 166 
Iowa City, 14 52240 



Previous Reports In this Series 



73-1. Weiss, D.J. & Betz, N.E. Ability Measurement; Conventional or Adaptive? 
February 1973 (AD 757788). 

73-2. Bejar, I.I. & Weiss, D.J. Comparison of Four Empirical Differentia l 
Item Scoring Procedures . August 1973. 

73-3. Weiss, D.J. The Stratified Adaptive Computerized Ability Test . 
September 1973 (AD 768376). 

73- 4. Betz, N.E. & Weiss, D.J. An Empirical Study of Computer-Administered 

Two-stage Ability Testing . October 1973 (AD 768993), 

74- 1. DeWltt L.J. & Weiss, D.J. A Computer Software System for Adaptive 

Ability Measurement . January 1974 (AD 773961), 

74-2. McBrlde, J.R. & Weiss, D.J. A Word Knowledge Item Pool for Adaptive 
Ability Measurement . June 1974 (AD 781894), 

74-3. Larkln, K.C. & Weiss, D.J. An Empirical Inv^estlgatlon of Computer- 
Administered Pyramidal Ability Testing . July 1974 (AD 783553), 

74-4. Betz, N.E. & Weiss, D.J. Simulation Studies of Two-stage Ability 
Testing . October 1974 (AD A001230) 

74-5. Weiss, D.J. Strategies of Adaptive Ability Measurement . December 1974. 



AD Numbers are those assigned by the Defense Documentation Center, 
for retrieval through the National Technical Information Service 



Copies of these reports are available, while supplies last, from: 

Psychometric Methods Program 
Department of Psychology 
University of Minnesota 
Minneapolis, Minnesota 55455 



^ 36 



